An Affect-Rich Neural Conversational Model with Biased Attention and Weighted Cross-Entropy Loss

2019-03-24

NLP

本文主要研究的是融合情感的开放域对话系统，在seq2seq的基础上增加了VAD (Valence, Arousal and Dominance)编码，引入了情感注意力机制来建模否定词和加强词的影响，使用加权交叉熵损失函数来鼓励模型生成包含情感的词。AAAI2019

paper link

Introduction

本文主要解决以下两个问题：

因为否定词和加强词会改变情感的极性，所以导致情感识别仍然存在困难
如何在生成的时候同时兼顾语法和情感两方面

本文以seq2seq模型为基础，引入了心理学领域的VAD情感词编码；为了建模否定词和加强词，作者使用了情感注意力机制；最后，使用加权交叉熵损失函数鼓励模型生成包含情感的词而不影响语言的流畅度。

Our main contributions are summarized as follows:

For the first time, we propose a novel affective attention mechanism to incorporate the effect of negators and intensifiers in conversation modeling. Our mechanism introduces only a small number of additional parameters.

For the first time, we apply weighted cross-entropy loss in conversation modeling. Our affect-incorporated weights achieve a good balance between language fluency and emotion quality in model responses. Our empirical study does not show performance degradation in language fluency while producing affect-rich words.

Overall, we propose Affect-Rich Seq2Seq (AR-S2S), a novel end-to-end affect-rich open-domain neural conversational model incorporating external affect knowledge. Human preference test shows that our model is preferred over the state-of-the-art baseline model in terms of both content quality and emotion quality by a large margin.

Affect-Rich Seq2Seq Model

Affective Embedding

模型使用VAD情感编码，VAD代表情感的三个因素，每个因素的得分范围在[1, 9]：

For example, word “nice” is associated with the clipped VAD values: (V: 6.95, A: 3.53, D: 6.47).

作者对原始的VAD情感分数做了限制[3, 7]，目的是避免在生成的时候重复出现VAD值偏大或偏小的词。

之后作者对VAD分数做了归一化：（[5,3,5]代表中性词的得分）

因此，将词向量与其VAD编码拼接得到包含情感的表征：

其中，$\lambda \in R_{+}$ 是一个超参数，用来调节情感embedding的强度。

Affective Attention

To incorporate affect into attention naturally, we make the intuitive assumption that humans pay extra attention on affect-rich words during conversations.

Affective Attention核心是在seq2seq + attention基础上，增加了一个情感偏置项：

其中 $\bigotimes$ 表示逐元素相乘，$||…||_{k}$ 表示$l_{k}$正则化，$\beta\in R^{3}$ 是一个缩放因子，取值在[-1, 1]。

$\mu(x_{t}) \in R, [0, 1]$ 用来衡量一个词的重要性，作者共使用了三种计算方式：

其中 $p(x_{t})$ 代表训练集中词的词频，$a, \epsilon$ 代表平滑因子。

We take the log function in $u_{li}(x_{t})$ to prevent rare words from dominating the importance.

$\beta$ 是用来建模否定词和加强词对情感极性的影响：

Note that our affective attention only considers unigram negators and intensifiers

Affective Objective Function

为了鼓励生成包含情感的词，引入了加权交叉熵损失函数：

Our proposed affective loss is essentially a weighted cross-entropy loss. The weights are constant and positively correlated with VAD strengths in l2 norm. Intuitively, our affective loss encourages affect-rich words to obtain higher output probability, which effectively introduces a probability bias into the decoder language model towards affect-rich words.

Experiment

Conclusion

本文提出了一个端到端融合情感的开放域对话系统，使用了外部的VAD知识，计算注意力时更加关注于情感词汇，同时也考虑了否定词和强度词的影响，最后使用加权交叉熵损失函数来鼓励模型生成包含情感的词。